An interpretable online prediction method for remaining useful life of lithium-ion batteries

Accurate prediction of the remaining useful life (RUL) of lithium-ion batteries is advantageous for maintaining the stability of electrical systems. In this paper, an interpretable online method which can reflect capacity regeneration is proposed to accurately estimate the RUL. Firstly, four health indicators (HIs) are extracted from the charging and discharging process for online prediction. Then, the HIs model is trained using support vector regression to obtain future features. And the capacity model of Gaussian process regression (GPR) is trained and analyzed by Shapley additive explanation (SHAP). Meanwhile, the state space for capacity prediction is constructed with the addition of Gaussian non-white noise to simulate the capacity regeneration. And the modified predicted HIs and noise are obtained by unscented Kalman filter. Finally, according to SHAP explainer, the predicted HIs acting as the baseline and the modified HIs containing information on capacity regeneration are chosen to predict RUL. In addition, the bounds of confidence intervals (CIs) are calculated separately to reflect the regenerated capacity. The experimental results demonstrate that the proposed online method can achieve high accuracy and effectively capture the capacity regeneration. The absolute error of failure RUL is below 5 and the minimum confidence interval is only 2.

In recent years, there have been a variety of hybrid methods, with two major categories 15 .The first type of hybrid methods is a combination of model-based and data-driven methods, which usually consists of filtering methods with equivalent circuit models and empirical degradation models for RUL prediction 16 .The battery model serves as the state equation, the output of the data-driven method is used as the observation, and the parameters of the model can be updated by the filtering to ensure the prediction accuracy 17,18 .In general, the initialization of model parameters is set empirically, which can significantly impact the convergence performance of the algorithm.The second type of hybrid methods is a combination of multiple data-driven approaches, which achieve RUL prediction by integrating different algorithms 19,20 .Wavelet transform, empirical modal decomposition and other techniques are commonly used in these hybrid methods to achieve data deconstruction with greater complexity of the algorithms 21,22 .These methods prioritize enhancing the prediction accuracy of RUL for offline conditions or online step prediction.However, they may not be able to respond promptly when the battery reaches EOL prematurely.For the online prediction, the capacity can be estimated after the HIs are predicted by hybrid methods.But the predicted HIs can hardly reflect the complete information, especially the capacity regeneration that can cause sudden fluctuations in capacity, which makes online estimation difficult 23 .
In addition, there is a concern about the interpretable analysis of data-driven methods, which can provide insights into black-box approaches.The interpretable analysis typically enables greater transparency in the decision-making process of the model.It can make the results of machine learning models more reliable and make the mechanism of the model more compatible with prior knowledge 24 .Generally speaking, the interpretation methods are classified into pre-model, in-model and post-model approaches 25 .Among them, postmodel approaches, such as the Shapley additive explanation (SHAP), are able to evaluate the contribution of all features 26 .Therefore, SHAP analysis is usually used to quantify feature contributions to the model and to deepen the model interpretation 27 .As an interpretation method, SHAP analysis cannot reflect the causality, and it is also challenging to enhance the data-driven method based on the analysis.
Given the aforementioned challenges, an interpretable online prediction method is proposed.Firstly, the proposed method extracts HIs from batteries' current, voltage and temperature signals to fully describe the battery aging.Secondly, HIs are quantified in terms of importance according to the grey relation analysis (GRA) method.The determined HIs are trained through SVR for the HIs model 28 .The predicted HIs by SVR serve as online inputs for both the capacity model of GPR and the state space of capacity prediction, respectively.On the one hand, capacity prediction based on the trained GPR model is followed by SHAP analysis.The HIs are ranked and divided into two groups by the contribution, with the criterion being more than 50% of total contributions.On the other hand, the state quantities of the state space are defined as the HIs with added Gaussian non-white noise that simulates the capacity regeneration which may occur suddenly at any time, and the observation quantity is the predicted capacity in that case.An unscented Kalman filter (UKF) is used to modify the HIs and the Gaussian non-white noise at each predicted time.Thirdly, the original predicted HIs are combined with the modified HIs based on the indexes obtained from the SHAP analysis.The original predicted HIs with greater contribution are selected as features ensuring the baseline of the predicted capacity of GPR, while modified predicted HIs are selected to provide the capacity regeneration information on the criterion of lesser contribution.Finally, the capacity can be predicted from the combined HIs and the GPR model, and the online prediction of RUL is achieved.It is noted that in the proposed method, the bounds of the prediction confidence intervals (CIs) are obtained in a different way.The lower bound can be obtained directly by calculating the 95% confidence interval from the final predicted capacity, while the upper bound needs to be calculated the interval after adding the modified Gaussian non-white noise to the final predicted capacity in order to characterize the capacity regeneration.
The main contributions of the paper are as follows: (1) Four HIs are constructed for online prediction to comprehensively characterize the battery aging, and the battery capacity is initially estimated online based on SVR and GPR.(2) A state space for capacity prediction is constructed with the addition of a Gaussian non-white noise for the capacity regeneration, and the predicted HIs and the artificial noise are modified by UKF.(3) SHAP analysis is developed to select appropriate HIs, ensuring the accuracy of capacity prediction while characterizing the capacity regeneration.(4) The upper and lower bounds of CI are developed to be obtained in two ways, while the lower bound ensures the reliability of the RUL prediction, the upper bound can show the capacity regeneration.
The rest of this paper is organized as follows: Section II proposes the interpretable hybrid method for online prediction of RUL.Section III introduces the extracted HIs and the corresponding quantitative analysis.Section IV presents the experimental results of the proposed method based on the NASA data sets.The conclusions are presented in Section V.

Methodology Theoretical foundations
Support vector regression SVR is a frequently utilized technique for addressing nonlinear regression problems with small data 29 .The strong generalization ability of this method is achieved by minimizing the structural risk, and the implied statistical information can be well mined when the sample size is sufficiently small 30 .The algorithm structure is shown as follows.
Training sample D = (x 1 , y 1 ), (x 2 , y 2 ), • • • (x m , y m ) is first determined, where x i ∈ R d is the d-dimensional feature vector, y i ∈ R is the target output, and the total number of training samples is m.Based on the training sample, the object of SVR is to obtain a regression model that sets f(x) as close as possible to y.And the SVR function can be described as follows: where ω is the weight vector and b is the bias value.
In SVR, the error ε is defined as the difference between f(x) and y, and the maximum error is set to ε max .When ε max is less than ε , ε will be used to be processed instead.In other words, the prediction is accepted when the training data are centered at f(x) within the regions of width 2ε max .Therefore, the objective of SVR can be expressed as follows: where C is the penalty factor, ℓ ε is the insensitive loss function which is as follows.
Since the errors in the actual problem will make SVR more sparse, this paper introduces slack variables ξ i and ξi , which reduces Eq. 2 to the following Eq. 4.
The constraints of Eq. 4 are shown below, Then, Lagrange multipliers µ i ≥ 0, ⌢ µ i ≥ 0, α i ≥ 0, ⌢ α i ≥ 0 are added to give the Lagrangian function shown in Eq. 6: After taking the partial derivatives of ω, b, ξ i , ⌢ ξ i and setting the partial derivatives to zero for Eq. 6 respectively, the new objective of SVR could be obtained as shown in Eq. 7.
From the constraints, the SVR function could be obtained as follows: Considering the nonlinear feature mapping, the function could be shown as follows: where κ is the kernel function.In this paper, common radial basis kernel function is chosen for SVR.The kernel function is listed as follows. (1) where σ is the kernel width.

Gaussian process regression
Based on the Bayesian theory, GPR is a flexible nonparametric model 31 .The majority of systems can be modeled by using a suitable combination of Gaussian processes (GP), and the performance degradation or model mismatch could occur in the long-term predictions.The GPR model performs better in multi-step prediction, and it is applicable to modeling the aging process of batteries with strong nonlinearity 32 .The structure of GPR is shown as follows.
In the Gaussian process modeling, f(x) is considered as a set of random variables, and the mathematical representation is shown as follows: The conventional GPR mean function is set to zero, while in this paper, the linear mean function is chosen as the mean function to obtain better long-term prediction performance.The linear mean function is shown as follows: The covariance function must be chosen to satisfy the semi-positive definite matrix characteristic, whose main function is to quantify the relationship between points.If the input points are close to each other, the outputs should also be close.Therefore, the squared exponent is chosen as the covariance function in this paper to ensure the prediction performance.the covariance function is shown as follows: where σ 2 f denotes the signal variance, and l denotes the characteristic length scale of each input vector.The set of hyperparameters of the constructed model is θ = a, b, l, σ 2 f .To achieve the optimization of hyperparameters, the maximum likelihood estimation is used in this paper.The method constructs the maximum likelihood function based on the unknown parameters and the sample data.The method could obtain the optimal hyperparameters by minimizing the negative log-likelihood function, which is shown as Eq.14.
where NLML is shown as follows.
To initialize the parameters, the conjugate gradient algorithm 33 is used to solve the equations, and the maximum value of the objective function can be obtained by taking partial derivatives of Eq. 15.The expressions are shown as Eq.16.
where tr denotes the trace of the matrix, and θ k is an element of the set θ of hyperparameter.
The observation equation is defined as Eq.17, where v is the Gaussian white noise, v ∼ N(0, σ 2 n ) .Thus the prior distribution of y could be shown as Eq.18.
In Eq. 18, δ ij is the Dirac function, δ ij = I n .Given the test input x * , the joint prior distribution of y and the prediction set y * are shown as follows: where K f (x, x) is the covariance matrix of the training data, and K f (x, x * ) is the covariance matrix of the training input and the test input, K f (x, x * ) = K f (x * , x) T , and K f (x * , x * ) is the covariance matrix of the test data.The analytic form of the derived posterior distribution is shown as Eq.20 (11)    www.nature.com/scientificreports/ The predicted average and the predicted covariance are shown as follows: The predicted average ŷ * is taken as the output of the prediction model, while cov(y * ) reflects the uncertainty of the prediction model.According to Eq. 23, the 95% confidence intervals CI could be calculated.

SHAP analysis
SHAP analysis was first introduced into the field of machine learning as a method in cooperative game theory in 34 .The method is able to quantify the contribution of each feature to the trained machine learning model f(x), which is an interpretation method independent of the type of model.The SHAP value φ i of each feature x i is calculated as shown in Eq. 24.
where F is the set of all features, S ⊆ F\{i} .For each feature x i , the SHAP analysis involves selecting each set containing x i to calculate its marginal contribution.And the SHAP value for x i is the weighted average of all possible differences.

Unscented Kalman filter
Kalman filter is a recursive algorithm for estimating the current state using the prior state and the current measured signal.To solve the nonlinearity, UKF achieves Gaussian density approximation by obtaining Sigma points through the unscented transformation 35 .The pseudo code of UKF algorithm is presented in Algorithm 1.
In Algorithm 1, f is the state equation, h is the observation equation, W and V are mutually independent Gaussian white noises with covariance matrices Q and R, respectively.n is the dimension of the state, j = 1, 2, • • • , 2n + 1 .is the parameter of scaling, and weight is the weight corresponding to the 2n + 1 Sigma points.

Interpretable online prediction method
In this section, an interpretable method for online RUL prediction is proposed, and the flowchart of the proposed method is shown in Fig. 1.The process is as follows: , where Cap tr,i denotes the i-th battery capacity value.The model parameters are optimally selected by taking the conjugate gradient algorithm.(d) As the capactiy prediction model, Model GPR can be obtained.

Unscented Kalman filtering:
For the actual battery capacity in the training data sets, which consists of the capacity of degradation and regeneration and the measurement noise.Obtained by Gaussian smoothing and polynomial fitting of the observation, the fitted curve of capacity can reflect the capacity degradation, while the fitting error characterizes the effect of capacity regeneration on battery aging.Therefore, a Gaussian non-white noise N is introduced to characterize the effect of capacity regeneration with the average and www.nature.com/scientificreports/variance in terms of the fitting error.And N is added as the state quantity to the state space, then the state space can be constructed as Eq.25.
where w∼N(0, Q) is the state process noise and v∼N(0, R) is the corresponding observation noise.Using UKF, the added noise N and the original predicted HIs {HIs} are modified together depending on the capacity of Model GPR .The modified HIs { HIs} and the modified noise N are obtained.This capacity prediction model depends on the Model GPR , while the capacity regeneration simulated with Gaussian non-white noise N occurs at a higher frequency and its characterization of the effect of capacity degradation is incomplete.However, the advantage is that the capacity regeneration simulated by N is sudden and superimposable, which is consistent with the actual phenomenon and can provide guidance.Moreover, by the closed-loop of UKF, the modified HIs can reflect the characteristics of capacity regeneration, and the noise N can also be modified in amplitude and frequency.7. SHAP explainer: From the original predicted HIs and the trained Model GPR , the shapley values at each cycle i of features are calculated according to Eq. 24.Based on the obtained SHAP values, the HIs at each cycle are ranked according to the percentage of contribution.Then, HIs are combined and divided into two groups based on the criteria of the minimum number of features and their total contributions exceeding 50% of the total contributions of all HIs.Specifically, at each cycle i, the HI with the largest SHAP value is selected first followed by recursive addition of features of the same class (whose SHAP values are of the same sign) until the total contributions of the selected HIs exceed 50% of the total contributions of all HIs, which is treated as group I and gets the corresponding labels Index1, while the remaining features are treated as group II with the labels Index2.The labels Index1 and Index2 are for the subsequent selection of HIs.Group I, with more contributions, is to retain the portion of the original predicted HIs to maintain the output of Model GPR stable and reliable, while group II, with fewer contributions, will be replaced by the modified HIs to add the information of capacity regeneration.8. RUL prediction: After the above steps, the Model GPR , the original predicted HIs {HIs} , the modified HIs { HIs} with the noise N , Index1 and Index2 are obtained.For each cycle i, HIs are selected from {HIs} and { HIs} according to Index1 and Index2, respectively, to incorporate the information of capacity regeneration while guaranteeing the trend of predicted capacity.The final combined HIs are {{HIs} 1 , { HIs} 2 } and are used as inputs to Model GPR to obtain the final predicted capacity Cap 2 .Based on Cap 2 , the lower bound of CIs, CI low , can be calculated according to Eq. 23, while the calculation of the upper bound CI up needs to be first calculated according to Eq. 26 to obtain the predicted capacity Cap 3 , and then calculated according to Eq. 23.
Since Cap 2 contains less information about capacity regeneration, Cap 2 is more reliable compared to Cap 3 , and CI low is calculated using Cap 2 to provide early warning of battery failure.Cap 3 is interspersed with more information about artificial capacity regeneration, and it is only used to calculate CI up to guide the use of batteries in which capacity regeneration is likely to occur and to improve the usage efficiency.When the predicted capacity Cap 2 is less than Cap EOL , the RUL prediction results and the prediction confidence intervals are output.9. Analysis of experimental results.

Data description
The experimental data of lithium-ion batteries were acquired from NASA Ames Prognostics Center of Excellence (PCOE) 36 .The data in this study include three 18650 lithium-ion batteries (B5, B6 and B7), which were tested in three different operating modes (CC and CV charge mode, CC discharge mode and impedance measurement mode).
During CC and CV charging, the batteries were charged at a constant current of 1.5A until the battery voltage reached 4.2V, and then continued to be charged in CV until the charging current dropped to 20mA.During CC discharging, all three batteries, B5, B6 and B7 were first discharged at a constant current of 2A until the battery voltage dropped to 2.7V, 2.5V and 2.2V, respectively.The charge and discharge process was repeated to conduct the accelerated aging tests of the battery.And after each charge-discharge cycle, the battery impedance was monitored by electrochemical impedance spectroscopy, scanned from 0.1Hz to 5kHz.

HIs extraction
The battery aging can be directly characterized by capacity and impedance, but these two indicators cannot be easily measured online due to the complex operation with expensive cost 37 .The aging can also be reflected in the variation of the observations that can be measured online, namely as HIs 38 .Therefore, it is essential to extract appropriate HIs from the observations as input features for RUL prediction.
Among the battery operations, researchers usually extract HIs from the charging or discharging process, which is defective in characterizing the battery aging 39,40 .In this paper, four appropriate HIs are extracted for online prediction, in which the effect of temperature on aging is included.The following is the analysis of the extracted HIs, taking the B6 battery as an example.Figure 2 shows the signals of B6 at different cycles.The extracted HIs and the predicted HIs with cycle numbers are shown in Fig. 3. www.nature.com/scientificreports/ • HI 1 : The time interval of equal charging voltage difference 41 .In CC charging, Fig. 2a shows the variation of charging voltage for different cycles.It can be seen that the charging time gradually decreases with the number of charging times, which is due to the deepening of battery polarization .In consideration of practical applications, most users do not wait until their devices run out of energy before charging.The charging time from 3.5V to 4.2V in CC charging, is used as HI 1 to describe the health of the battery, and the extracted series of equal charging voltage rise is shown in Fig. 3a.• HI 2 : The time interval of equal charging current difference 2 .The loss of lithium-ions is more in CV charging.
From Fig. 2b, the change rate of current gradually slows down as the battery gradually ages.This causes a tendency for the charging time to increase during CV charging, indicating that battery aging severely affects the lithium-ion embedding.Therefore, the time interval from the start of CV charging until the current drops to 100mA is chosen as HI 2 , shown in Fig. 3b.• HI 3 : The time interval of equal discharging voltage difference 42 .With repeated charging and discharging, the battery in devices gradually degrades and the usage time of devices becomes shorter, which reflects the decay of the maximum battery capacity. Figure 2c shows the variation of the discharge voltage of battery in different cycles.The curve variation is consistent with the capacity decay trend.Therefore, the time interval of equal discharging voltage drop is extracted as HI 3 , shown in Fig. 3c, to characterize the battery aging.
(  • HI 4 : The average temperature of equal charging current difference 43 .Temperature is an important indicator of battery aging and can directly reflect the battery impedance, which is mainly composed of a combination of Joule heat and electrochemical reaction heat.Figure 2d shows the temperature variation of the battery during charging at different cycles.In CV charging, the temperature decreases due to the combination of heat dissipation, irreversible exotherm and heat absorption by the electrochemical reaction, which better reflects the battery aging.Therefore, the average temperature from the beginning of CV charging till the current reaches 100mA is extracted as HI 4 , shown in Fig. 3d.

Grey relation analysis
GRA is developed from gray system theory, which mathematically quantifies the geometric relationship between factors with lower requirements for the quality of data 44 .The strength of the relationship between the factors can be assessed by calculating the gray correlation.In this paper, GRA is used to quantify the correlations between the extracted HIs and the capacity.The steps of GRA are shown below: 1. Determination of the analysis sequence: The battery capacity sequence is used as the reference sequence X 0 = { x 0 (i)|i = 1, 2, • • • , k} , and the extracted HIs are used as the comparison sequence where the sequence length is k and the number of comparison sequence vectors is n.

Data reprocessing:
The original data are normalized.

Calculation of the correlation coefficient:
The correlation coefficient between the points of sequence X 0 and X n is calculated by Eq. 27, where α is the resolution coefficient, 0 < α < 1 , and α is set to 0.5 in this paper.www.nature.com/scientificreports/

Correlation Calculation:
The correlation between X 0 and X N can be calculated by Eq. 28, where r n represents the degree of correlation between the HIs and the capacity.The closer r n is to 1, the stronger the correlation between the sequences.
Table 1 shows the correlations between the four HIs and the capacity for each of the three batteries.From Table 1, it can be seen that the extracted HIs have a high correlation with the battery capacity.The correlations are all higher than 0.68, which can be used to characterize the battery aging.

Evaluation criterion
To verify the interpretable online prediction method proposed in this paper, the root mean square error (RMSE), the mean absolute percentage error (MAPE), the absolute error (AE), and the relative error (RA) are used to evaluate the performance of the proposed method, and their expressions are shown as follows.
where Q k and Q * k denote the true value and the predicted value, respectively, and n is the total number of Q * k .RUL true is the actual RUL and RUL predicted is the predicted RUL.The 95% confidence interval shown below is used to express the uncertainty of the prediction results.
RMSE is an important indicator for most predictions, which reflects the overall performance of the prediction method, while AE, RA and CIs are more significant in RUL prediction.AE and RA directly correspond to the error in predicted capacity at the time of actual battery failure.And CIs are able to alert the user well in advance of the battery failure, guiding the battery to be utilized efficiently.

HIs prediction results
In this section, the experiments are conducted using the battery data sets (B5, B6 and B7) from NASA, and three different starting prediction points (T=70, T=80 and T=90) are set for each experiment.The data before T is used as the training set, the number of battery charging and discharging cycles is used as the training input, and the HIs are used as the training output.Based on SVR, the HIs model is constructed, and the predicted HIs after T can be obtained.
Taking the B6 battery as an example, Fig. 3 shows the predicted results of the HIs when the starting prediction point T is 70.The overall changes of the real HI and the predicted HI mostly converge, which indicates that the HI model can capture the trend of HIs well and reflect the battery aging.
The evaluation of the HIs prediction results for the three batteries is shown in Table 2.The largest RMSE in HI 1 is found for B6 battery, and the largest RMSE in HI 2 is found for B7 battery, both of which have values over 100.However, due to the large magnitudes of HI 1 and HI 2 , their corresponding MAPEs are both below 0.056.And the minimum MAPE is 0.004 in HI 3 .As T goes backward and the number of training samples increases, the RMSE and MAPE of prediction results get smaller, which means that the prediction results become more (28) Table 1.The correlation between the HIs and the capacity.www.nature.com/scientificreports/accurate.The only exception is that the prediction of HI 3 for the B6 battery worsens at T = 90 , which is caused by the overfitting phenomenon.The prediction error of HIs model at different starting prediction points T is relatively small and the prediction performance is stable.Therefore, the predicted HIs can characterize the battery aging and replace the future data of aging for the online prediction.

UKF filtering
Before constructing the state space and filtering, the actual capacity in the training data sets needs to be processed.The observation noise in the data is first removed by Gaussian smoothing, and then the polynomial fitting is utilized to obtain the degradated capacity that does not contain the information of capacity regeneration.The average and variance of the error between the fitted curve and the actual capacity are input to the state space as parameters of the Gaussian non-white noise for filtering.
As an example, Fig. 4 shows the smoothing and fitting results of the actual capacity of the B6 battery.From Fig. 4, it can be seen that the smoothing capacity effectively removes the measurement noise from the original signal, while the fitted curve only retains the degradated capacity, and the corresponding fitting error can reflect the capacity regeneration.The Gaussian non-white noise is defined as N i ∼N(mean f , Q f ) , where mean f and Q f are the mean and variance matrices of the fitting error, respectively.The filtering of the UKF is then performed, and Fig. 5 shows the modified HIs and the modified Gaussian non-white noise.From Fig. 5a and d, the modified HIs are close to the real HIs in variation, and the modified HIs have obtained the information of capacity regeneration on the basis of the original predicted HIs

SHAP explaination
From the original predicted HIs, the capacity model of GPR is trained, and the SHAP analysis is performed on the trained model.Again using the B6 battery as an example, Fig. 6 shows the predicted capacity of Model GPR with the results of the SHAP analysis.For SHAP values, their positive or negative sign does not correspond to a good or bad causality of features, but only characterizes the ability of features to contribution.Thus, the absolute SHAP values can visualize the influence of HIs on Model GPR .www.nature.com/scientificreports/As shown in Fig. 6b and c, the SHAP values of HIs keep changing with the battery aging.The SHAP values of the HIs showed a significant shift when the cycle i was 80 to 100.Comprehensively, HI 3 is the most influential among all HIs, while HI 4 is the smallest.Figure 6d shows the variation of SHAP values in relation to the feature values.Each dotted line is a prediction sample, and thus the density of the lines indicates the distribution of the feature values.Among the samples, HI 1 has more points with smaller feature values, whose corresponding SHAP values play a negative influence, while the contrary is the case for HI 2 and HI 4 .HI 3 has a more homogeneous distribution of feature values with the corresponding SHAP values, which plays a more important role in Model GPR .In addition, in the gray correlation analysis, HI 2 has the lowest correlation with the battery capacity, while in the SHAP analysis, the original predicted HI 2 is second only to HI 3 in terms of its overall contribution to Model GPR .The result illustrates that the original predicted HI 2 obtained by SVR contains more information about the battery aging.
After SHAP analysis, The labels Index1 and Index2 can be obtained to get the final HIs, {{HIs} 1 , { HIs} 2 } .Since the traditional RUL prediction is offline or the online step prediction, there are no future HIs (after T) as feature input in the online prediction.In this paper, the predicted HIs are obtained by SVR, which are smooth compared to the real HIs.Comparing with the prediction results on battery capacity in Fig. 6, the original predicted HIs contain the most information on battery degradation, while losing the part of the information on capacity regeneration and observation noise.The results of SHAP analysis are able to quantify the amount of information on battery degradation contained in different HIs at different future cycles, respectively.For the B6 battery, the original predicted HI3 contains the most information of battery degradation and HI4 the least during the whole prediction cycles.By retaining enough degradation information at different future cycles, a reliable degradation capacity could be predicted.Therefore, by combining the original predicted HIs with the modified HIs, {{HIs} 1 , { HIs} 2 } contains both the majority of information on capacity degradation and capacity regeneration.

RUL prediction results
Based on {{HIs} 1 , { HIs} 2 } , the battery capacity can be predicted by Model GPR .When the predicted capacity is reduced to the threshold Cap EOL , the RUL prediction results become the output.Additionally, the CIs can serve as an early warning when their lower bound reaches Cap EOL , while their upper bound of the capacity regenera- tion can directly guide the usage of the battery.In this section, three batteries (B5, B6 and B7) are samely used for experiments to verify the validity and advantages of the proposed method.Three starting prediction points ( T = 70 , T = 80 and T = 90 ) are set for each set of experiments.Compared with the traditional scaling of data, the data is divided in this way in order to verify the proposed method when capacity regeneration occurs at T = 90 .The rated capacity of the batteries is 2Ah, and Cap EOL is set to 1.4Ah.It should be noted that the Cap EOL of B7 battery is set to 1.5Ah because the capacity of the B7 battery did not degrade to reach 1.4Ah.
For comparison, the capacity prediction models all use the predicted HIs (which are obtained through SVR) as the input.The Model I (M1) is the proposed interpretable online prediction method.The Model II (M2) directly uses the predicted HIs of SVR to train the capacity model of GPR for comparison.The Model III (M3) is based For the B5 battery, the predicted capacity of proposed M1 is closer to the actual capacity with a narrower lower CI compared with other methods.According to Table 3, the RMSEs of M1 at different T are around 0.02, which is closer to 0. The results indicate that the proposed M1 could be less affected by starting prediction points as  well as more stable prediction performance.The AEs of M1 are less than 4, the RAs are higher than 0.9, and the CIs are less than 5, which proves the effectiveness of the proposed method.For the upper bound of CIs, which is calculated from the regeneration points (Reg-Points) of capacity, fluctuates continuously to assess the impact of the capacity regeneration.As can be seen from Fig. 7 to Fig. 9, when there is a significant capacity regeneration in the actual capacity, the upper bound almost coincides with it.When the capacity regeneration is less, the trend of upper bound is still similar to the trend of the regenerated capacity.It should be noted that when T = 70 , the CIs of RUL by the proposed method M1 do not include the final predicted RUL, precisely because the upper bound of the CI accurately captures the regenerated capacity, and the result can serve as an early warning.Therefore, the upper bound can provide a guidance to improve the efficiency of battery utilization.For the B6 battery, according to Table 4, the proposed method M1 has a narrower CI than the comparison methods.And for the proposed M1, the RMSEs are less than 0.04, the AEs are less than 5.When T = 70 , although the AE of M1 is higher than that of M2 by 3, the significantly narrower CI of M1 includes the actual regenerated capacity.When T = 90 , the RMSE of the proposed method is 0.004 higher compared to M2, but its AE is smaller and the upper bound of CI almost captures the capacity regeneration at the starting point of the prediction, which is beneficial for the battery management system to make timely decisions before battery failure.Although the proposed method can accurately predict RUL and deal the effect of capacity regeneration, the predicted capacity at the end of the cycle deviates from the actual capacity, due to the observation of UKF as an output of Model GPR , which constrains the accuracy of the predicted capacity as well as the CIs.
According to the prediction results of B7 battery from Fig. 7 to Fig. 9, the capacity degradation curves are accurately obtained by the proposed M1, and the lower prediction CI is narrower.Table 5 shows the comparative results for the B7 battery (Note: '/' indicates that the capacity predicted by the method did not reach the failure threshold of capacity).As shown in Table 5, compared with M2 and M3, the proposed M1 has the RMSEs lower than 0.0223 at different T with more stable prediction performance.The AEs of the proposed M1 are no more than 3, and the RAs are higher than 0.91 with a confidence interval of less than 13.The upper bound of CI obtained by the proposed method M1 can effectively reflect the impact of capacity regeneration, especially when T = 90 and the capacity regeneration occurs at the starting prediction point.The experimental results prove the advantages of the proposed method.
Considering the application of the proposed method M1 to online RUL prediction, the experimental time of M1 and M2 was used to evaluate the computational burden.The methods were experimented on a personal Although the method M1 needed to add the approximately 36% experimental time, its prediction results (RMSE and AE) were significantly better than those of the comparison methods.And the computational burden of the proposed method M1 was acceptable compared to the time period of 13000-14000s for a single cycle of the battery.To further verify the proposed method M1, the method M1 is compared with the mainstream online methods of step prediction, including the GPR (M4) 45 , the bidirectional long short-term memory networks (M5) 21 , and the method for the combination of empirical mode decomposition, long short-term memory and GPR (M6) 46 .The results are shown in Figs. 10, 11 and 12, and the statistical results are listed from Table 6 to Table 8.  www.nature.com/scientificreports/When T = 70 , the performance of M1, M5 and M6 is stable.According to Fig. 10, the results of the proposed method M1 are significantly better on the two batteries B5 and B7.The RMSEs are from 2% to 4%, which illustrates that the overall prediction performance of M1 is better, and the predicted capacity is closer to the actual capacity.Meanwhile, the AEs of the three batteries are 4, 5 and 2. The AEs, while corroborating the prediction accuracy of capacity, prove that the predicted RUL of the proposed method M1 is reliable.As shown in Fig. 10b, the results of M5 are covered by the prediction confidence intervals of M1.But the upper CI of the proposed M1 is closer to the actual regenerated capacity and the predicted capacity is more accurate, which can ensure the reliable implementation of the threshold Cap EOL .
When T = 80 , all RMSEs are less than 8%, and the predicted capacity of all methods is more accurate compared to the results of T = 70 .As shown in Fig. 11d, the RMSEs and AEs of the proposed method M1 are significantly smaller than those of other methods especially for the B5 battery and the B7 battery.For the B5 battery, the AE of proposed M1 is only 2, while the AEs of the other methods are at least 9, which is significantly higher than the AE of M1.For the B7 battery, the proposed method M1 can accurately predict the battery capacity with the RMSE of 1.61% and the corresponding AE is only 1, which indicates that the predicted capacity almost  When T = 90 , the AEs of the proposed M1 are 1, 2 and 3 for the three batteries, which is significantly better than the comparison methods.From Fig. 12, it can be seen that when the predicted capacity of the comparison methods is closer to the actual capacity, the corresponding curves all overlap with the confidence intervals of M1 to a large extent.These results firstly illustrate the accuracy of proposed method M1.And due to the different learning abilities of different methods for the overall degradation trend of capacity and local regeneration phenomenon, the results also prove the reliability of the proposed method M1 for RUL prediction.In addition, the proposed method M1 can accurately obtain and handle the capacity regeneration that occurs at the starting prediction point for all three batteries, with the upper bound of CIs that almost coincides with the regenerated capacity.The results prove the effectiveness and advantage of the proposed interpretable online prediction method for RUL.

Conclusion
In this paper, an interpretable online prediction method for RUL of lithium-ion batteries has been proposed.The proposed method firstly extracts four appropriate health factors to comprehensively characterize the battery aging, preparing for online prediction.Secondly, the proposed method has been used to construct a hybrid framework of SVR, GPR, UKF and SHAP explainer to achieve the interpretable and accurate online prediction of RUL.Moreover, it obtains the narrower lower prediction CI and the upper bound of CI which reliably reflects the capacity regeneration.Finally, the verification experiments are performed using NASA data sets.The RMSEs of prediction results are less than 4%, and the maximum AE is 5.The experimental results illustrate that the redundant information of training data can be removed by the proposed method, and the method will add the www.nature.com/scientificreports/information of random capacity regeneration.Thus, the constructed HIs and the capacity model can effectively retain the capacity information with the confidence intervals reflecting the regenerated capacity.The suggested method has the potential to facilitate timely maintenance of lithium-ion batteries and enhance battery utilization efficiency.
The suggested method has demonstrated great prediction capability.However, it still requires sufficient amount of high-quality training data with reliable processing hardware.The future research of this work is to reduce the data dependency of the method and increase the robustness to variable operation conditions by transfer learning.High-rate charging and low-temperature are highly investigated operation conditions, which lead to nonlinear variations in battery characteristics that have been difficult to predict in RUL.Therefore, the subsequent study will specifically focus on these two extreme conditions.In addition, the batteries are usually  www.nature.com/scientificreports/combined into battery packs by series-parallel connections in devices.Therefore, research on battery packs also needs to be considered in the prediction.

1. 2 . 3 . 5 .
Feature extraction and analysis: From constant-current (CC) charging signals, constant-voltage (CV) charging signals, CC discharge signals, and temperature signals, appropriate HIs are extracted to characterize the aging of lithium-ion batteries.The GRA method is used to quantitatively assess the reasonableness and validity of the extracted HIs.Data processing: Normalization of sample data.Parameters initialization: The starting prediction point T and the battery capacity failure threshold Cap EOL are set.4. HIs modeling: Based on SVR, HIs model is constructed and the predicted HIs can be obtained and as the future input in online prediction.The detailed steps are as follows: (a) According to T, the data are divided into the training set and the test set.(b) The penalty coefficient C and kernel width σ of SVR are initialized.(c) The HIs model is trained using the training set i, HIs tr,i T−1 i , where i denotes the i-th charging and discharging cycle in the training data sets, HIs tr,i denotes the HIs corresponding to the i-th cycle.(d) As the predicted HIs after the T point, {HIs} could be obtained by the trained HIs model.Capacity prediction modeling: Taking the historical HIs as the input to the capacity model and the battery capacity as the output, the relationship between HIs and the battery capacity is modeled based on GPR.The detailed steps are as follows: (a) According to T, the data are divided into the training set and the test set.(b) The hyperparameters θ =[a, b, σ f , l] of GPR are initialized.(c) The capacity model is trained with the training set HIs tr,i , Cap tr,i T−1 i

Figure 1 .
Figure 1.The flowchart of proposed method.

Figure 2 .
Figure 2. The voltage, current and temperature signals.

4 Figure 3 .
Figure 3.The extracted HIs and the predicted HIs.

Figure 5 .
Figure 5.The modified HIs and the modified noise.

Figure 7 .
Figure 7. Online prediction results of RUL when T = 70.

Figure 8 .
Figure 8. Online prediction results of RUL when T = 80.

Figure 9 .
Figure 9. Online prediction results of RUL when T = 90.

Table 2 .
Experimental results of predicted HIs.

Table 3 .
Experimental results of RUL online prediction for B5 battery.
T Method

Table 4 .
Experimental results of RUL online prediction for B6 battery.

Table 5 .
Experimental results of RUL online prediction for B7 battery.

Table 6 .
Experimental results for B5 battery.

Table 7 .
Experimental results for B6 battery.

Table 8 .
Experimental results for B7 battery.